Skip to content

perf(gui): Significantly reduce cost of 2d render elements#2514

Open
githubawn wants to merge 26 commits intoTheSuperHackers:mainfrom
githubawn:perf/gui-batch
Open

perf(gui): Significantly reduce cost of 2d render elements#2514
githubawn wants to merge 26 commits intoTheSuperHackers:mainfrom
githubawn:perf/gui-batch

Conversation

@githubawn
Copy link
Copy Markdown

@githubawn githubawn commented Mar 31, 2026

Reduces draw calls by batching 2D elements by texture state in setup2DRenderState.

Environment Scenario (-quickstart) Before After
Main Machine Skirmish Screen 477 FPS 918 FPS
M1 Macbook (Wine) Skirmish Screen 16 FPS 61 FPS
M1 Macbook (Wine) In-Game (Match Start) 22 FPS 33 FPS
Main Machine Selecting a thousand units in skirmish 45 FPS 56 FPS

Added early-out clipping and refined bounds checks to skip rendering objects outside the active region.

Optimized HotKey rendering by removing a redundant Render() call.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR introduces a 2D draw-call batching system that groups consecutive render operations sharing the same texture, blend mode, and grayscale state, flushing the accumulated geometry only when the state changes or the batch ends. It also adds an early-out bounds check at the top of drawImage and refines the inner clipping condition so that images fully within the clip region skip the clipping computation entirely, and removes a redundant Render() call for hot-key text.

Confidence Score: 5/5

Safe to merge; no P0/P1 issues found and the performance gains are substantial.

Texture ref-counting is correct across all batching/non-batching/raw-texture paths. The re-entrance guard on beginBatch/endBatch prevents nested-batch corruption. The clipping early-out is placed before any resource acquisition so there are no leaks on early-return paths. The only findings are P2 style suggestions.

No files require special attention.

Important Files Changed

Filename Overview
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Core of the PR: adds setup2DRenderState, onBeginBatch/onEndBatch/onFlush, and wires all draw* methods into the batch. Texture ref management via Get_Texture/Add_Ref/Release_Ref is correct across all batching/non-batching paths.
Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Mirror of the GeneralsMD changes for the vanilla game build; identical logic and same minor style note applies.
GeneralsMD/Code/GameEngine/Source/GameClient/Display.cpp Adds beginBatch/endBatch/flush base-class implementations with correct re-entrance guard.
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplayString.cpp Moves hotkey text render after TheDisplay->flush() to commit the 2D batch before text rendering; no-op outside batch mode.
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DInGameUI.cpp Wraps the InGameUI draw pass in beginBatch/endBatch; re-entrance guard prevents double-nesting.
Core/GameEngineDevice/Source/W3DDevice/GameClient/W3DView.cpp Wraps iterateDrawablesInRegion with beginBatch/endBatch; clean integration.
GeneralsMD/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h Adds TextureClass forward declaration and new batch member fields; all initialized in the constructor.
GeneralsMD/Code/GameEngine/Include/GameClient/Display.h Adds public batching API and protected virtual hooks; m_isBatching initialized in Display() constructor.

Sequence Diagram

sequenceDiagram
    participant Caller as W3DView / InGameUI
    participant Disp as Display (base)
    participant W3D as W3DDisplay
    participant R2D as Render2DClass

    Caller->>Disp: beginBatch()
    Disp->>W3D: onBeginBatch()
    W3D->>R2D: Reset()
    note over W3D: m_batchNeedsInit=TRUE<br/>m_batchTexture=nullptr

    loop Each draw* call
        Caller->>W3D: drawImage / drawLine / ...
        W3D->>W3D: setup2DRenderState(tex, mode, grayscale)
        alt Same state as current batch
            W3D-->>W3D: early-return (no flush)
        else State changed
            W3D->>W3D: onFlush()
            W3D->>R2D: Render() + Reset()
            W3D->>W3D: swap tex ref, update mode/grayscale
        end
        W3D->>R2D: Add_Quad / Add_Line / Add_Tri
    end

    Caller->>Disp: endBatch()
    Disp->>W3D: onFlush()
    W3D->>R2D: Render() + Reset()
    Disp->>W3D: onEndBatch()
    W3D->>W3D: REF_PTR_RELEASE(m_batchTexture)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp
Line: 2739

Comment:
**Long clipping condition on one line**

This is a very long line (~125 chars) that mixes all four boundary checks together. In the early-out block at the top of the function the same four conditions are already split across four indented lines — the inner clipping check would benefit from the same treatment for consistency and debuggability (each sub-condition can have its own breakpoint).

```suggestion
		if (screen_rect.Left   < m_clipRegion.lo.x ||
			screen_rect.Right  > m_clipRegion.hi.x ||
			screen_rect.Top    < m_clipRegion.lo.y ||
			screen_rect.Bottom > m_clipRegion.hi.y)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp
Line: 2630

Comment:
**Long clipping condition on one line (Generals mirror)**

Same long single-line clipping check as in the GeneralsMD copy — worth splitting onto four lines for consistency with the early-out block directly above it.

```suggestion
		if (screen_rect.Left   < m_clipRegion.lo.x ||
			screen_rect.Right  > m_clipRegion.hi.x ||
			screen_rect.Top    < m_clipRegion.lo.y ||
			screen_rect.Bottom > m_clipRegion.hi.y)
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (21): Last reviewed commit: "implemented feedback" | Re-trigger Greptile

Comment thread GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
@githubawn
Copy link
Copy Markdown
Author

githubawn commented Mar 31, 2026

Significantly improves #109
Fully fixing this requires significant refactoring to Render2DSentenceClass for fonts

optimize clipping check
Comment thread GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread GeneralsMD/Code/GameEngine/Source/GameClient/Display.cpp Outdated
return statement on seperate line
centralized flush
@githubawn
Copy link
Copy Markdown
Author

@greptileai

"One P1 render-state initialization bug in onBeginBatch() should be fixed before merging to avoid potential cross-frame visual corruption."

The P1 finding could be a false positive based on a local variable analysis that omits the downstream hardware synchronization in Render2DClass::Render() (render2d.cpp:601).

Render2DClass is a self-synchronizing state machine. Its Render() call (line 628 for textures, line 694 for shaders) explicitly pushes its internal members to the DX8Wrapper on every execution. The tracking variables in W3DDisplay (m_batchTexture, etc.) are mirror-images of Render2DClass's internal members.

happy?

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 31, 2026

No, the P1 finding stands.

Your argument about Render2DClass being self-synchronizing addresses hardware state consistency, but not the C++ undefined behavior in W3DDisplay.

The issue:

  • onFlush() reads !m_batchNeedsInit (line 617 of your diff)
  • setup2DRenderState() reads !m_batchNeedsInit (line 627)
  • If uninitialized, these branches are unpredictable

Even if Render2DClass correctly synchronizes hardware state on every Render() call, W3DDisplay's control flow still depends on m_batchNeedsInit to decide:

  1. Whether to call Render() in onFlush()
  2. Whether to skip redundant state setup in setup2DRenderState()

An uninitialized bool means these decisions are based on garbage. The fact that the hardware might recover downstream doesn't make reading uninitialized memory defined behavior.

The fix is one line. Keep it.

@Mauller
Copy link
Copy Markdown

Mauller commented Apr 1, 2026

I see greptile woke up on the sassy side of the motherboard.

@Mauller
Copy link
Copy Markdown

Mauller commented Apr 1, 2026

Reduces draw calls by batching 2D elements by texture state in setup2DRenderState.

Environment Scenario (-quickstart) Before After
Main Machine Skirmish Screen 477 FPS 918 FPS
M1 Macbook (Wine) Skirmish Screen 16 FPS 61 FPS
M1 Macbook (Wine) In-Game (Match Start) 22 FPS 33 FPS

Another test would be to get a lot of units on screen then select them all so their unit info is being drawn.

The health bars make use of the line drawing classes which i believe are inherently affected by this change.

@githubawn
Copy link
Copy Markdown
Author

I was initially going to say that health bars render in drawViews() (outside this batch), but your comment made me realize it's only a 6-line change to wrap that too.

Jumped from 45 FPS to 56 FPS.

image

@Mauller
Copy link
Copy Markdown

Mauller commented Apr 1, 2026

I was initially going to say that health bars render in drawViews() (outside this batch), but your comment made me realize it's only a 6-line change to wrap that too.

Jumped from 45 FPS to 56 FPS.
image

Nice ~30% improvement there

@Skyaero42
Copy link
Copy Markdown

Compile errors need to be fixed before this can be reviewed

Comment thread Generals/Code/GameEngine/Source/GameClient/Display.cpp
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplayString.cpp Outdated
@githubawn githubawn force-pushed the perf/gui-batch branch 2 times, most recently from 33ccf35 to 5dd0e6e Compare April 6, 2026 18:53
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplayString.cpp Outdated
Comment thread GeneralsMD/Code/GameEngine/Include/GameClient/Display.h Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
@xezon xezon changed the title perf(gui): implement batched UI rendering perf(gui): Reduce cost of 2d render elements by 30% to 70% Apr 14, 2026
@xezon xezon added GUI For graphical user interface Major Severity: Minor < Major < Critical < Blocker Performance Is a performance concern Gen Relates to Generals ZH Relates to Zero Hour labels Apr 14, 2026
@xezon xezon changed the title perf(gui): Reduce cost of 2d render elements by 30% to 70% perf(gui): Significantly reduce cost of 2d render elements Apr 14, 2026
Copy link
Copy Markdown

@xezon xezon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice. Can you measure the performance improvement for just the 2D rendering on its own? I expect it will be somewhere north of 95%.

Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngine/Include/GameClient/Display.h Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Comment thread Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Outdated
Copy link
Copy Markdown

@xezon xezon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much gains does the 2d rendering have now on its own?

Comment thread Generals/Code/GameEngine/Include/GameClient/Display.h Outdated
@xezon
Copy link
Copy Markdown

xezon commented Apr 18, 2026

Interestingly the replays are failing

@githubawn
Copy link
Copy Markdown
Author

The replays look like an CI issue.

My profiler is currently crashing, so I can't give you a function-level breakdown of the 2D render path alone.

Best I have for now is indications:
The Skirmish screen showing 918 fps instead of 477 fps when using -quickstart.
When in a match with the 3d viewport disabled it jumped from 2200-3300 fps to 2700-3700 fps.
Selecting 1,000 hackers caused an immediate drop from 56 FPS to 45 FPS, with this PR, the stays near the same even with all 1,000 units selected.

Comment thread GameClient_Reference
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You submitted a mysterious file here. Perhaps this fails the Replay check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Gen Relates to Generals GUI For graphical user interface Major Severity: Minor < Major < Critical < Blocker Performance Is a performance concern ZH Relates to Zero Hour

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants